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1 Introduction 

In this research effort, the usefulness of hp - version finite elements and adap- 
tive solution-refinement techniques in generating numerical solutions to op- 
timal control problems has been investigated. Under NAG-939, a general 
FORTRAN code was developed which approximated solutions to optimal 
control problems with control constraints and state constraints [1,2]. Within 
that methodology, to get high-order accuracy in solutions, the finite element 
mesh would have to be refined repeatedly through bisection of the entire 
mesh in a given phase. In the current research effort, the order of the shape 
functions in each element has been made a variable, giving more flexibility 
in error reduction and smoothing. Similarly, individual elements can each be 
subdivided into many pieces, depending on the local error indicator, while 
othei 1 parts of the mesh remain coarsely discretized. The problem remains to 
reduce and smooth the error while still keeping computational effort reason- 
able enough to calculate time histories in a short enough time for on-board 
applications. 

As an aid in evaluation of the error for use in solution refinement, in 
optimal control problems, the integral of the boundary value problem (the 
Hamiltonian) can be made to equal zero. Using this information alone, a 
reasonable error indicator can be developed, but as is shown in later sections, 
it is not consistently accurate and only tends to model parts of the error well. 
However, the Hamiltonian can be used as a check of the more sophisticated 
error estimators. 

The error estimator currently being developed is based on work done by 
Estep, et al . , [3], for initial value problems. This will involve the solution of 
a dual problem, a linear differential equation of the same order as the main 
problem, the behavior of which indicates the overall level of error in the 
main problem, /ip-refinements will then be build around this error indicator 
to attempt to equidistribute the error through the time interval and attempt 
to provide a minimum of error for a given computational effort. It remains 
to be investigated if this refinement process can be accomplished in a short 
enough time 

The rest of this paper describes the work that has already been done 
in developing an adaptive scheme for implementing the bp-version of the 
finite element method to solve optimatl control problems. The family of 
optimal control problems that can currently be solved will be defined in 
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section 2. Next in section 3, the variables of the problem will be modelled 
using shape functions, thereby establishing the algebraic equations to be 
solved. In section 4, some textbook-style problems will be solved using with 
this method and the results will be analyzed. In section 5, the Hamiltonian 
will be shown to be a reasonable, but inconsistent, error measure, along 
with some preliminary results in adaptively refining solutions to textbook 
problems. Following that will be the preliminary development of the residual 
error indicator discussed above. Finally in section 6, an outline of future 
research work will be given. 

2 Optimal Control Problems 

A FORTRAN code has been developed which uses hp-version finite elements 
to approximate solutions to a particular subset of optimal control problems. 
In this section this subset of problems will be defined, and the equations to 
be solved using finite elements will be established. 

2.1 Problem formulation 

Systems being studied are governed by general, nonlinear differential equa- 
tions 

x-f(x,u,t) x £ R n ‘,ue R n “,t €[t 0 ,t f ] (1) 

where the state vector x describes the state of the system, u is a vector 
of control variables, and t is the time. Although the methodology and the 
code are not so restricted, for simplicity’s sake, this discussion is confined 
to problems with only a single set of differential equations in a single time 
interval. Furthermore, the functions / are assumed to be differentiable with 
respect to their arguments, and the states are assumed to be continuous, 
from initial time t 0 (presumed to be zero) to final time */, where t f can be 
fixed or free. 

General boundary conditions on the states can be specified at the initial 
time, the final time, or some combination of both, in the form 

*[*(<o),*(</),t/l = 0 tf (2) 

Let J be a cost functional to be minimized that can contain both a scalar 
penalty on the states at the endpoints to or tj plus an integral penalty on 
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(3) 


the states, controls, and time: 

J = (f>[x{t 0 ),x(t f ),t f \ + f ' L(x,u,t) dt 

J <0 

The optimal control problem is then to find the control vector time history 
u(t) which causes the system governed by Eq. (1) to meet the boundary con- 
ditions (2) such that the given cost functional (3) is minimized. Admissible 
control histories axe assumed to be bounded and continuous. 

This formulation also allows for inequality constraints of the form: 

g(x,u ) <0 g € R n * (4) 

no more than n u of which can be active at any one time. The function 
g need not be a function of the states, but it must be a function of the 
control vector. Otherwise it falls under the category of state constraints, 
which require special handling that has not been developed yet under this 
methodology. 

The constraints in Eq. (4) are enforced through use of slack variables, k, 
such that (4) is replaced by the equality constraints, 

gi(x, u) + kf = 0 i = 1 . . . n g (5) 

To simplify notation, a vector of these squared slack variables is defined such 
that 

Ki = h? i = 1 . . . Ti g ( 6 ) 

2.2 Calculus of variations 

In using calculus of variations to minimize a cost functional subject to con- 
straints [4, 5], the residuals of the differential equations, control constraints, 
and boundary conditions are adjoined to the original cost function by means 
of Lagrange multipliers A (t), / j( / ) , and v respectively. This yields a new cost 
function J' such that 

J' = <t>[x(to),x(t f ),t f ] + u T *[x(t 0 ),x(t f ),tf] 

( 7 ) 

+ J* 1 { L(x , u, t) -f A T [/(x, u, t) - x] + fi T [g(x, u) + AT]} dt 
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which equals J if all the constraints are satisfied. The functions A (t) are 
known as the costates. 

Next, to simplify notation, define quantities 

H = L + A T f + n T (g + K) (8) 

$ = </> + v T yi (9) 

where H is known as the Hamiltonian of the system. Also define xj = x(tj) 
and xq = x(to), thus reducing J' to 


/= <Z>(x Q ,x f ,t f ) + J*' [H(x,u,t)-X T x] dt (10) 

To satisfy the first-order necessary conditions for a local minimum, the 
authors of Ref. [5] find an extremal of J 1 by varying the control vector, 
with the variation in the state vector (and other variables) being a result 
of the variation in the control vector. Here instead, the development in [1] 
is followed in which the first variation of J' is taken, allowing independent 
variations in the states, state rates, controls, Lagrange multipliers, slack 
variables, and final time, yielding 


d<j> d$ 

6J' = Su T ^ + —dx(t 0 ) + - — dx(t f ) + 

OXq OX f 


°± + H -\ Ti 


dti 


rh \dH dH T e\T, t \ 

+ / 6x + -r — Su — A J 6x + 6X 1 (/ — x) 

Jt 0 [ ox ou 

+ 6n T (g + K ) + dt 


( 11 ) 


where 

6Ki = 2ki6ki i = 1 ...n g (12) 

This results in a weaker form of the equations, though as will be shown, the 
equations developed in [5] are all still satisfied. 

The variations denoted by 6 are variations holding time fixed (appropriate 
for variations of quantities under an integral), while those with a d are total 
variations or differentials, which allow time to vary (appropriate for varying 
quantities at an end point). These variations are related at the end points 

by 


dx(t Q ) = Sx(t 0 ) 

dx(tf) = 6x(tf) + x(tf)dtf 
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(14) 



since tf is allowed to vary while to is not. 

In an attempt to group terms by their variational coefficient, one can 
remove the time derivatives of variational parameters by integrating A T 6x 
by parts and expand the total variations at the end points. Eq. (11) then 

nprnmM 

6J' = 6u T V + ^6x(to) + !?- 6x(t f ) - \ T H' 0 

OX o OX f 


+ f|^-A T l x(t f )dt f + (^ + h) dt f 

l dx f J t, Jt, (15) 

+ [ ’ [ir-<5r + -zrf> u + A T &r + r (f - x ) 

Jt 0 [ OX uU 

+ 6n T (g + K) + ^ t 6K] dt 

Defining subscripts on H to denote partial derivatives and rearranging terms 
gives: 

tr = ^ + (f + a ) tt dt > + (H - aT ),„ 6xM 

+ (wr xT ),; Mit ’ + {^- xT ),, 6xM (16) 

+ J tf [HJu + HJx + X t 6x + 6 \ T (f - x) 

+ S/j, T (g + K) + h t 6K] dt 


Now, defining H and A as 


A t = 


t = t o 


HZ' * = *! 
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and combining terms yields: 


6J' = 6v T <H + {H + H) t dt f + [(A - A) T dx]*' 

+ [ tf \HJu + HJx + \ t 6x + 6X T f + 6X T x ( 19 ) 

Jto 1 

+ Sfi T (g + K) + dt 

In an analagous way to finding the extremum of a scalar function of 
many variables, the first order necessary condition for a extremal solution 
to the functional J' is that 8J' — 0 for arbitrary independent variations in 
the states, costates, time, slack variables and other Lagrange multipliers. 
Since this includes the case where all the variations except any one are zero, 
this condition implies that for this cost functional to have a minimum, all 
the quantities multiplied by each of the variations must be zero. At the 
boundaries, this results in the following equations: 


A(t 0 ) + A(t 0 ) = 0 (20) 

*(</) + *(«/) = o (21) 

<b = 0. (22) 

Also, if the final time is indeed free to vary, then this relation must also hold: 

H + H = 0 (23) 


otherwise, dtf = 0 and Eq. (23) can be ignored. In appendix A the difference 
between these two cases will be discussed. 

The integral in Eq. (19) is made equal to zero in continuous time by 
enforcing the following equations: 

X T (t) + H x (x, A,u,t) = 0 (24) 

x(t) — f(x,u,t) = 0 (25) 

H u = 0 (26) 

(hereby refered to as the costate equations, state equations, and the optimal- 
ity condition respectively), and the control constraint equations: 

5 + ^ = 0 (27) 
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2 pik{ = 0 i = 1 . . . rig (28) 

These equations are equivalent to the Euler-Lagrange equations developed 
in [5] and define a two-point boundary value problem in x and A. 

To approximately solve this problem using hp- version finite elements, a 
form of these equations will be developed such that all the boundary condi- 
tions are enforced weakly through the use of Lagrange multipliers. This will 
allow shape functions to be chosen that do not need to meet the boundary 
conditions, thus allowing the same set of shape functions to be used to solve 
a wide class of optimal control problems. This way the number of equations 
to be solved is not affected by the boundary conditions, except for fixed vs. 
free £/. 

To look at the equations on the interior of the time interval, first assume 
the boundary conditions are satisfied. Eq. (19) then becomes: 

6J' = f' [HJu + H x 6x - 6x T A + 6X T f + S\ T x 

j t o ^ j 

+ 6p T (g + K) + p t 6K] dt 

At this stage, the analogy can be drawn between variational methods 
and Galerkin methods. Interpreting the variations in the variables as test 
functions, the equation 6J' = 0 becomes the representation of the residuals 
in meeting Eqs. (24 - 28) being orthogonal to the test functions (Sx, 6u , . . .). 
This observation will become important when we later look at a posteriori 
error estimators. 

2.3 Discretization of the problem 

Again looking at Eq. (29) as a variational problem, to solve 6J' = 0, the 
infinite dimensional problem is simplified by projecting the true unknown 
solution onto a finite set of piecewise polynomials. The variations are then 
approximated using piecewise polynomials of appropriate order to generate 
a set of nonlinear equations. 

Obviously, different choices for these polynomials yield different sets of 
equations. Previous work [2] used discontinuous, piecewise-constant poly- 
nomials for the main variables. This choice is prudent for problems which 
have discontinuities in the states or costates, which many optimal control 
problems do, so in this effort, discontinuous shape functions will be used. 
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Another advantage is that in using discontinuous polynomials, the sim- 
plest version of those (piecewise constants) allows the integrals in Eq. (29) to 
be done by inspection rather than by numerical quadrature. Even the sim- 
plest continuous polynomial approximations will require numerical quadra- 
ture over an element. 

As Eq. (29) stands though, with the x and A terms under the integral, 
using discontinuous shape functions results in jump-terms at each node point 
when the time interval is discretized, adding to the complexity (specifically, 
the dimension) of the problem. To avoid this, continuous approximations for 
6x and <5 A are used, and the terms A T 6x and 6X T x are integrated by parts. 
Eq. (29) then becomes: 

6J 1 = X T 6xf' - 6X T x\ t / 0 + [ tf \HJu + H x 6x - A T Sx + 6X T f 

0 0 Jto L (30) 

+ 6X T x + Sfi T (g + K) + /z t <5AT] dt 

Next, the time interval is broken up into N not necessarily equal length 
time elements, At,-, such that the time at each element boundary (or node ) 
t,- is calculated as: 


h = < 0 (31) 

tj = i -f- At,- i = 2, . . . , IV +1 (32) 


and the states and controls at these nodes are defined to be 

x, = x(ti) i = 1, . . . , N + 1 

U{ = u{ti) i — 1, — ,iV + l 

The time within the i th element, t, is expressed as t,- = t,-_i 
0 < t < 1 and 


(33) 

(34) 

+ rAt,- where 


so that 


r = 


t,- t«— i 

At, 


= J_^Li = J_(.v 

dt At, dr Ati 


(35) 

(36) 
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Substituting these relationships into Eq. (30) gives: 

sf = \ T 6xf£ 

N -if fj\*T fatT 

+ jr A ti jf <5w,- + (#*),• (^ 7 ) 

+«r/.+<tfta +*)+!?«] * 

In this equation, a subscript i on a variable refers to the value of that 
variable within the i th element, while the subscript on a function indicates 
the value of that function evaluated using variables within the i th element. 


3 Shape Functions 

In this section, the specific shape functions alluded to in the previous section 
will be defined and used to solve Eq. (37). What will result is a set of 
nonlinear algebraic equations to be solved numerically. 

Following the work of [8], to avoid jump-terms while also keeping the 
shape functions as simple as possible, C° shape functions for the variational 
quantities are implemented in each element. These are in terms of nodal 
values ( : ) and polynomial coefficients on element interiors ( 7 ). 


n »-i 

6 A, = <5A,(1 — t) + <5A,+it + ^ (1 — T)T(3j(T)6\ij (38) 

j = i 
"*-1 

Sxi = £i,(l - r) + 6x i+ iT + j] (1 - r)r^(r)6% (39) 

j = i 

Here n b is the order of the shape function polynomial being used, with 
n b = 1 the summation would be ignored, and this represents what is used 
in the h - version development. The functions pj(r) are Jacobi polynomials of 
order (J — 1) as detailed in [9]. 

The time derivative of these expressions is of the form 

= fi x '. = AtiSii = — 6£j + 6xj+i + 52 7 j( r )6xij ( 40 ) 

dr j=i 
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where 


(41) 


7i( T ) = 1(1 - ^)t/3j(t)]' 

Similar expressions hold for the costates. 

For the states and costates shape functions are chosen that are only con- 
tinuous within the element: 



o 

II 

<H 

o 

II 

fc. 

<-r 



n b 

Xi = < 

52 0 < T < 1 A,- = < 

52 0 < r < 1 (42) 


j= 1 

j= 1 


„ i«+l r= 1 

Ai+l T = 1 


where the functions a^r) are polynomials of order j — 1 as derived in [8] to 
simplify the algebraic equations later on. Note that x t - and A, are discrete 
values, distinct from the shape functions within the element. The boundary 
conditions on the states and costates will be enforced weakly through these 
nodal values. 

For the controls, control constraint Lagrange multipliers, slack variables, 
and their variations, again no time derivatives exist in Eq. (37), so the same 
shape functions are used: 


6ui = Y, Q j( T ) s Uij (43) 

i 

Sfii = 52 <*j{T)6jkj Mi = 52 otj{T)6kij (44) 

j = l i=i 

u < = $2 a i( r ) s «i 0 < r < 1 (45) 

i=i 

Hi = 52 0 < r < 1 fc« = 52 a j( T )kij 0 < T < 1 (46) 

3 = 1 3=1 

No nodal values of the controls, slack variables, or control constraint 
Lagrange mulipliers are needed in this problem formulation. They can be 
calculated as needed after the fact through the optimality condition and the 
control constraint equations of the previous section, Eqs. (26) - (28). 
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where 

e i( r ) = (1 ~ T ) T Pj( T ) ( 48 ) 

Note that the only nodal values of the states and costates that appear here 
are at the endpoints of the time interval. The other internal nodal values 
can be generated from the state equations once the other nodal and interior 
values have been solved for. 
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Because the polynomials a(r) and 7 (r) were chosen to be orthogonal, 
Eq. (47) reduces to 


6J f — A^ +1 (5x;v+i — Affoi — ^w+i^Aiv+i Xj 6X\ 


N 




+ E A< « / ajWSuij 




+ Sxf (H x )i(l - t) + 6xJ +1 (H x )iT + «f/,(l - r) + 6 Af +1 / t r 

+ E + € i( r )^5/«] 

j=i 

T 


n» 




V =1 


n» 


.9 1 + [ E 


U=i 


(49) 






n fc 


+ 2 hr a i( r )A*i E a i( r )^i E a ;( r )^; > <*r 


0‘ =1 


U =1 


i=i 




+ E -^fx M +6Af +1 


AT 


+ E ^a m -^iA 


i = 2 / 

_ \ 

1,1 d* ^«. 7 — 1 Ajj 1 

i=2 / 
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Arranging terms by their variational coefficient gives 
6 f = 6A[ J-xi + x 1A - A*i jf (1 - t)/i dr] 

. r r 1 

I + E 6 *T [-*<-14 - y 0 r /«- 1 dr 

+ x .,1 - At; f (1 - r)/i dr 

+e + a< . / dr 

»=i j=i J0 

+^5r+i *N4 A f;v t f ff dr + x/v+i] 

+5f[ A x - A lt i - A ti j (1 - r)(/f x )i dr] 

+ E^f [— Ai t i - A U / (1 - r)(J/.),- dr 
•=2 L - 70 i ( 50 ) 

+ A,_ u - At,_i J r(H x )i - 1 dr] 

+ E E 1 «5 [a,J + i + M f 1 €j(r)(H x )i dr] 

i=i i=i 1 • /0 J 

f— Ajv+i + Ajv.i — At^r jf t(H x )s dr] 

+ E II %> A *> / (- ff «)‘ Q i( r ) dT 

i j = l 0 

+ EE^,M / a j( r ) 9 . + f E a i(r)^'i 

. i-1 • /0 [ V=1 

rj r ( T r 

+ EE^0 A << / 2aj(r) E a j( r )^«i E a i( r )^'> dr 

i j=i Jo b=i J b =1 J 

The variational coefficients are independent and arbitrary, due to the 
weak form of the equations. Therefore, for the variation of J' to be zero, 
the expressions multiplied by the coefficients must be identically zero. In 
this way, the first-order necessary conditions for optimal control are approx- 
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imated. To supplement the equations derived from setting Eq. (50) equal 
to zero, the boundary conditions need to be enforced, as provided in the 
previous section: 


? rr d<t> T d 

Al + dT 0 +l/ 5xo 

= 0 

(51) 

>-> 

+ 

1 

4? ■©- 

1 

05 CD 
£ § 

= 0 

(52) 

. d<f> T d$ 

H(‘f) + + v T w 

utf Otf 

= 0 

(53) 

^{x 0 ,x N+1 ,tf) 

= 0 

(54) 


Thus, the two-point boundary value problem (20) - (28) is approximated 
by the set of nonlinear algebraic equations (50) - (54). When a problem 
has multiple phases (whether by discontinuity of the states or the differential 
equations, or both), the equations are similar and can also be handled by the 
code. The differences are that the user has to specify extra boundary con- 
ditions, including whatever will trigger the discontinuity, and then the code 
handles the corresponding jump conditions for the costates and Hamiltonian 
automatically. 

In the following sections, how these equations are implemented will be ex- 
plained, and example problems will be shown which demonstrate the use and 
performance of ftp-version finite elements. Following that will be a discus- 
sion of how to improve the solution adaptively by gauging the error in each 
interval, followed by some preliminary results using this refinement technique 


4 Implementation and Results 

A numerical solution which makes the expressions from Eq. (50) equal to zero 
is found using a restricted-step Newton-Raphson method, implemented in a 
FORTRAN code. A restricted-step Newton-Raphson method differs from a 
standard Newton-Raphson method in that if the full Newton step yields a 
higher value of the objective function than the starting point, the step size 
is halved repeatedly until an improving step is achieved. 

The determination of the Newton step requires the solution of a linear 
system of equations. The Jacobian of these equations can be very large as 
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shown in Figure 1, which is for the problem defined in section 4.1. The 
equations for the most part only depend on neighboring elements, so the 
Jacobian ends up being quite sparse, as seen in Figure 2 for the same problem. 
That feature is exploited by the use of sparse linear systems solvers from the 
Harwell subroutine library [11]. These routines use a special sparse version 
of Gaussian elimination with partial pivoting, where information is retained 
between calls to the routine, so subsequent linear systems to be solved using 
the same Jacobian structure (as is common in determining different Newton 
steps) or simply different right-hand sides (as is common in repeatedly step 
halving) take considerably less time. 

The symbolic-manipulation package MACSYMA developed by Symbolics 
[12] is used to generate analytical partial derivatives of the system equations, 
boundary conditions, and control constraints for use in the Jacobian. MAC- 
SYMA is also used to generate the a(r) and e(r) polynomials from Eq. (50), 
which come from a recursion formula involving derivatives and integrals of 
polynomials, as developed in [8]. The user specifies the order of the poly- 
nomials, and MACSYMA generates the necessary expressions, but the user 
does have to supply initial guesses at the polynomial coefficients. 

The integrals in the equations are approximated using Gauss-Legendre 
integration [10], with the user selecting the number of Gauss points, which 
at this time is constant for all of the integrations. No fewer Gauss quadrature 
points should be chosen than the order of the shape functions being used plus 
one. This is because the zeros of the shape functions correspond to zeros of 
polynomials used in Gauss-Legendre integration. More Gauss points may be 
used, and that has shown to be more accurate in some problems, but not 
all, nor has the value of this extra computational effort been established. In 
Refs. [6] and [7], it is established that the minimum error (or most accurate 
stresses) in finite element approximations occur at the Gauss points. 

4.1 Unconstrained problem 

In perhaps the simplest representation of an aerospace problem with non- 
linear system dynamics, this first problem involves the maximum velocity 
transfer to a particle of mass m to a specified horizontal flight path in a fixed 
time (see [5], pg. 59). The mass is acted on by a force of constant magnitude 
ma and variable heading f3(t). The states for this problem are position of 
the particle x (horizontal) and y (vertical) and the corresponding velocity 
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components u and v. The differential equations for this system are then: 

x = u 

y = v 

u = a cos (3 
v = a sin j3 

with initial conditions corresponding to a zero velocity at the origin and with 
terminal constraints that the particle is in horizontal flight at a given height 
(assumed here to be 1), yielding a boundary condition vector, 

= u(0) 

\&2 = u ( 0 ) 

$3 = Z(0) 

$4 = y(0) 

^5 = y(t f ) - 1 
*6 = v(tf ) 

The final horizontal position x(tf ) is unspecified, and the final horizontal 
velocity u(tj) is to be maximized. The cost function is then 

J = u(tf) 

with the boundary conditions enforced by setting VP = 0. 

Reference [5] gives the analytic solution in terms of the initial force head- 
ing angle, the final time, and the final altitude in unspecified units. These 
values were chosen to be 75°, 1, and 1 respectively. 

The code was run for this problem for a variety of combinations of finite- 
element parameters. Figures 3 and 4 show the square root of the integral 
of the squares (i.e. the two-norm) of the relative error time histories for the 
states and controls as a function of the CPU time involved in calculating 
solutions. The data points correspond to the 2-, 4-, 8-, 16-, and 32-element 
solutions. The initial guess for each case was extrapolated from the con- 
verged solution for the case of half as many elements (or for one fewer shape 
function coefficient if only 2 elements). All of the specified boundary condi- 
tions were met to within 10 -12 for the case when all the elements are evenly 



(56) 
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spaced. Figures 5 and 6 show the the same error data vs. the number of free 
parameters in the problem, which equals the dimension of the Jacobian. 

Once an initial solution was obtained for a low-order shape function and a 
small number of elements, the code converged easily as the parameters were 
changed. Not surprisingly, the overall errors reduced as the order of shape 
functions increased and as the number of elements increased. 

What is surprising is the particularly good performance of the first-order 
shape function through the higher error regions. It isn’t until errors get below 
10 -6 in the states that the higher shape function orders get significantly 
better, with each of the other shape function orders being best for each 
subsequent 2 orders of magnitude error. That pattern gets pushed even 
further down and wider with the control variables. First-order is best all the 
way down to 10 -8 with each subsequent higher-order shape function being 
best seemingly for another 3 or 4 orders of magnitude, until the code runs 
into problems with round-off error. 

Later these results will be compared with those when the mesh is refined 
using an adaptive scheme. For now these results show the increase in accuracy 
that is possible using higher-order shape functions in solving optimal control 
problems. 

4.2 Example with control constraint 

Next a problem with a control constraint was studied, again from Ref. [5]. 
The problem is to minimize the cost function 

J = ^(</) 2 + \ j 0 ' u2(it ( 58 ) 

where x and u are scalars. The system dynamics axe governed by 

x = h(t)u (59) 

for some function h(t), subject to two control inequality constraints, 

gi = u — 1 < 0, and 

92 = -(u + 1) < 0 

An exact solution is available [2] if the final time is chosen to be 10, the 
initial condition is a given constant, and 

= ( 61 ) 
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The exact value for the state at the final time is x(tf ) = —17/39 and the 
optimal control is 


— x(tf)h(t ) 0 < < < 2 

1 2 <<<11/3 

—x(tf)h(t) 11/3 <<< 8 

-1 8 <<<10 

This problem is not as well-behaved numerically as the previous problem, 
and so solutions are not readily available for the sequences of distributions 
of elements. Errors do reduce significantly as the mesh is refined and the 
shape function orders are increased, but showing those results will be deferred 
to Section 5 when they will be compared to results using adaptive error- 
reduction techniques. 

The penalty of increased CPU time associated with using higher-order 
shape functions could be serious enough to thwart using this methodology to 
solve problems in real time. However, many of the element interior unknowns 
can be eliminated at the element level by solving a small set of nonlinear 
algebraic equations in which the nodal values are taken as given, splitting 
the problem (and hence the Jacobian) into outer and inner loops for the 
nodal and interior values to an element respectively. The scheme may then 
turn out to be especially powerful in a parallel computing environment since 
a different processor could be assigned to each element. The number of 
processors, strictly speaking, would not be required to be any larger than 
the number of sub-regions which are free of discontinuities. 

Another way to bring down the computational effort while still reducing 
and smoothing errors is by more prudent selection of the element mesh and 
shape function orders. Starting with a coarse discretization, the optimal 
distribution of elements and higher-order shape functions could be found 
that minimizes the error for a given number of parameters. In the next 
section some indicators available to gauge error for use in such a process will 
be examined. 
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5 Implementation and Results with Adap- 
tivity 

To develop a means of adaptively adjusting mesh and shape function pa- 
rameters, subroutines were developed to split a given element or raise the 
shape function order if a certain error criterion was met. Initial guesses are 
then determined for any new parameters, and then the new (larger) set of 
equations is solved again using the Newton-Raphson iteration. 

This section describes two possible error indicators, one involving the 
calculation of a single function and one more elaborate method involving 
the solution to an adjoint problem. The simpler one has been implemented, 
and the corresponding results will be described first. The more elaborate 
error indicator is still in development, so the development of it will be more 
sketchy. 

5.1 Hamiltonian as error indicator 

The Hamiltonian is a logical choice for an error criterion since it is a first 
integral of the two point boundary value problem derived from the first-order 
necessary conditions for optimal control. In the problems under considera- 
tion, it contains all of the controls and costates, at least some of the states, 
and all of the state rates. Thus looking at how the Hamiltonian varies from 
its optimal value can be a valuable gauge for the magnitude of the error in 
the variables in the problem. 

The development in appendix A shows how the optimal control problem 
can be cast such that the Hamiltonian at the optimal solution is always zero. 
With the Hamiltonian having been transformed in this way, the difference 
between H(t) and the optimal (namely zero) can now easily be measured. 
What remains to be seen is how good of an indicator of error (either pointwise 
or integrated) in the solution that the Hamiltonian is. 

The two criteria for parameter adjustment that have been looked at most 
closely are the deviation from zero of the Hamiltonian at any given node and 
the jump that the Hamiltonian makes between two adjacent nodes. 

The two problems studied were the same as in Section 4. The first set of 
results is for the unconstrained problem in Section 4.1. The problem was first 
solved for 2 elements. At this point, the Hamiltonian was a non-zero constant 
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across both elements, so both elements were split. The resulting Hamiltonian 
and relative error distributions are shown in Figure 7. The “Total Error” is 
the two-norm of the relative errors in all the states, costates, and control; the 
“ x-u error” includes only errors in the states and controls; and the “u error” 
includes only the error in the control. Relative error was unavailable at the 
endpoints for the states since they were constrained to be zero. Similarly, 
the optimal value of the control at the midpoint of the time interval is zero, 
so that data point will also be missing in each of these plots. 

As can be seen, the Hamiltonian jumps the most in the two central el- 
ements. Using a criterion that elements should be split if the Hamiltonian 
jump across them was 10% of the maximum jump along the trajectory, the 
code split these two central elements, resulting in the smoother Hamiltonian 
distribution in Figure 8. 

At this point, all the Hamiltonian jumps were about the same, so all 
the elements were split, resulting in the 12-element solution shown in Figure 
9, where again some peaks developed, which were smoothed out in the 16- 
element solution of Figure 10. This process can continue until point the 
Hamiltonian and the overall error in all variables is below 10 -10 . 

In each case, the Hamiltonian peaks near the middle elements picked up 
the peak error in the control in that region. Meanwhile the peak in the 
overall state error (including the two states which did not appear in the 
Hamiltonian) seemed to correspond to a secondary peak in the Hamiltonian. 
The total error was dominated by the error in one of the costates which was 
a constant. Even so, the total error was in the same order of magnitude as 
the error in the Hamiltonian. 

In the second example with the control constraints from Section 4.2, 
jumps in the Hamiltonian seem to better highlight regions where the mesh 
should be refined. Starting with 7 elements, so as not to have a node fall upon 
an optimal entrance or exit point for a control constraint, the Hamiltonian 
is constant within each phase of the problem, whether on or off the control 
constraint, as shown in Figure 11. The code senses the jumps in the Hamil- 
tonian near each of the switching points and put more nodes there, resulting 
in the 16-element solution in Figure 12. To compare, Figure 13 shows how 
the error and nodes would be distributed with 17 uniformly spaced elements. 
The errors are an order of magnitude higher. Figure 14 shows the results of 
a couple of optimizing runs done on the results from Figure 13. 

In all the plots for this problem, the control error is necessarily zero 
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along constrained arcs. Similarly, the Hamiltonian error was always zero in 
the last constrained arc since the Hamiltonian is actually enforced to be zero 
at the final time. With a constant control in that region, the equations can 
be integrated exactly. The plots show that to bring the error down in the 
eaxly regions, the switching points have to be nailed down precisely, which 
is exactly what happens when optimization is based on Hamiltonian jumps. 
Unfortunately, the code can only make one set of element splits before re- 
solving the problem. This bisection technique would take a long time to reach 
the switching points exactly. When state constraint capability is added to 
the code, the endpoints of a control constraint arc can then become variables. 

Thus, element splitting based on Hamiltonian jumps seems to do a reason- 
able job of smoothing and reducing error, and the Hamiltonian itself seems 
to be a reasonable measure of errors within the elements, but the results 
are not consistant enough. The Hamiltonian, while easy to calculate from 
given information and a reasonable gauge for error, lacks the information 
about how well the differential equations are being approximated. Also lack- 
ing is a sense of how finely to discretize a given element, as demonstrated 
in the control constraint problem. The next a posteriori error estimator to 
be investigated is being designed to gauge the residual errors in meeting the 
equations within each element. 

5.2 Residual error technique 

As mentioned in section 2.2, the variational equations can be interpreted 
as residuals multiplied by test functions, the definition a Galerkin method. 
Since Galerkin methods minimize the two-norms of the residual errors, for- 
mulating the problem in this framework will help greatly in determining an 
error estimator. In this section, the preliminary work done in analyzing this 
system to find adequate residual error estimators will be presented. 

As a starting point, the analysis will not treat complete optimal control 
problems, but rather two point boundary value problems (TPBVP) without 
controls, using the work of Estep et al. [3] in initial value problems as a 
basis. The extra constraints of the optimality condition and any control 
constraints will be added later. Likewise, only time-varying linear systems 
have been examined thus far, and each state must have exactly one given 
boundary condition. Dr. Estep’s continued collaboration in developing this 
error estimator is acknowledged and appreciated. 
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The states are first split depending on at which endpoint the boundary 
condition is specified. For this case, the system being analyzed has the form 

y = A(t)y 0 <t<t f 

Vl( 0) = y 0 (63) 

ydtf) = yj 

where y € R n is partitioned as 


Using the continuous Galerkin finite element method, the object is to find 
the polynomial Y € C q such that: 

J [Y - A(t)Y, v] dt = 0 Vu € D q ~ l 

n(o) = yo (65) 

^R^f) = y/ 

where C q is the set of all piecewise polynomials on [0, < /] of order q which are 
continous across element boundaries. D q is the set of all piecewise polynomi- 
als on [0,</] of order q which are not continuous across element boundaries. 

Subtracting Eqs. (63) and (65), and defining the error as e = Y — y, one 
yields 

J [e — A(t)e, v] dt = 0 Vi> € D q ~ l 

e L (0) = 0 ( 66 ) 

e R(tf) = 0 

where e is partitioned similarly to y as 




Integrating Eq. 66 by parts yields: 


J 1 [e - A{t)e,v) dt = e(tf) T v(tf) - e(0) T u(0) + JJ [e, —v — A(t) T v 
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where 


e(tf) T v(t f ) = e L (tf) T v L (t f ) 
e(0) T v(0) = e*(0) T M°) 

Now let v be the solution of 

—v — A(t) T v = e(t) 0 < t < tf 
v L (tf) = 0 
MO) = 0, 

reducing the right hand side of Eq. (68) extensively: 

[ ' ||e|| 2 dt = f f [e — A(t)e , u] dt 
Jo Jo 

Adding in Eq. (63) gives: 

J*' \\ef dt = J*' [Y - A(t)y,v] dt 

Defining n q as a projection operator into the space of polynomials of order 
q, Eq. (65) implies that 7r 9 _iu is orthogonal to Y — A(t)Y, and hence it can 
be added to the right hand side of the dot product: 

J 1 ||e|| 2 dt = J ’ [y — A(t)Y, v — 7t 9 _iu] dt (73) 

Now, Y 6 D q ~ l so it is orthogonal to v — 7T 9 _ii>, reducing the above to: 

[ h ||e|| 2 dt = f' [~A(t)Y, v - «■,_!»] dt (74) 

JO Jo 

By a similar argument, any other function in D ,_1 can be added on the left 
side of the dot product, so in order to make the term on the left small, add 
in the projection of A{t)Y : 

f 1 ||e|| 2 dt=[' {7r,_i [-A(t)y] - A(t)Y , TTq—\v + v} dt (75) 
Jo Jo 

Using standard calculus integral inequalities plus the inequality that 

f \v - K q -iv) dt < [ ' IMI dt, (76) 

Jo Jo 


(69) 

(70) 

(71) 

(72) 
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Eq. (75) becomes: 

[ f \\e\\ 2 dt < f 1 \\v\\ dt ■ max k m ■ max\\ A(t)Y - ir^A^YW] 

Jo Jo m<n tei m J (77) 

= S • raax(fc m • R m ) 

m<n 

where k m is the length of the mth element, I m (out of n), S is the sensitivity 
function, and R m is the residual on the mth element. Note that different 
choices for inequalities to use result in different error estimators, which may 
happen to suit optimal control problems better. That will be examined as 
work continues. 

With a target integral error of TOL, the original finite element problem 
(65) is solved once, and the residuals are calculated. At the first iteration, 
the new time interval lengths k$ can be computed for each m as 

® = (TOL) 2 j (S • *£> • Rm) Vm € [1, n] (78) 

km 

with S assumed to be one. For subsequent iterations, the dual problem 
(70) is solved with the forcing function e(t) approximated as the difference 
between the approximate solutions from the previous two iterations. Then S 
is calculated and used in calculating the next mesh refinement. In this way 
the solution is refined, equidistributing the error. 

This methodology has been implemented into FORTRAN and tested on 
two time-varying, linear systems, but the results are still preliminary. The 
next step will be to include nonlinear dynamics, but from previous work 
of Estep [3], the main difference will be that the Jacobian of the system 
dynamics will replace the A(t) matrix in the dual problem, retaining its linear 
nature, and indeed making it a reasonably trivial calculation compared to 
solving the huge system of nonlinear equations. Once that is working, the 
next step is to add scalar functions and constraints to the two-point boundary 
value problem, which should also be a straightforward extension. 

6 Summary and Future Work 

An updated version to GENCODE, developed by Hodges and Bless [1], (but 
without state constraints) has been developed to solve a variety of opti- 
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mal control problems using /ip- version finite elements with adaptive mesh- 
parameter adjustment. Textbook problems with non-linear system dynamics 
and control constraints have already been solved for a variety of combinations 
of finite element parameters, including adaptively modifying the distribution 
of mesh parameters with the Hamiltonian as an error indicator. The Hamil- 
tonian has been shown to be an inadequate error indicator in and of itself, 
necessitating more sophisticated measures if optimal mesh-parameter design 
is to be done. The first steps toward a residual error measure have been 
taken and look promising. 

To that end, research will need to continue to be done concerning the 
techniques for error estimation and adaptive adjustment of mesh parameters. 
The residual error techniques will have to be further refined and tested, 
while analysis needs to be done to verify the experimental evidence already 
obtained regarding the use of the Hamiltonian as an error indicator. 

The methodology to handle state constraints, a realistic part of many 
aerospace applications, will have to be developed and implemented. Once 
that capability is added, control constraints will be able to be handled with 
variable endpoints as zeroth order state constraints, if desired, greatly im- 
proving the accuracy in solving those kinds of problems. That way any 
adaptive routine will not have to iterate as much to determine the optimal 
switching points. 
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A Reformulating the Hamiltonian 

In this section, we will develop the properties of the Hamiltonian and re- 
formulate it to become a convenient measure of error in the system. The 
Hamiltonian will be studied for the general class of optimal control prob- 
lems studied in section 2, but without control constraints as that simply 
complicates notation. 


27 



Borrowing notation from section 2, define the Hamiltonian in the standard 
way as 

H = L(x,u,t) + X T f(x,u,t) (79) 

From the first order necessary conditions for optimality (see section 2), 
the condition on the Hamiltonian at the final time is determined from 


, d<t> T d V 
H(t f ) + -^ + u T — 

Otf utf J 


dt f = 0 


(80) 


If tf is fixed, then dtf = 0, and the Hamiltonian can remain free at the 
final time. Otherwise dtf is a free variation and thus 


d<j> 

dtf V dtf 

at the optimal solution. In either case, no conditions are imposed on the 
Hamiltonian at any time other than the final time. Often tf is not constrained 
or penalized. In this case, Eq. (81) reduces to 




H(t f ) = 0 (82) 

In the case when tf is a fixed number t, the code is set up to assume that 
tf is still a free variable. This way the same set of equations is solved by the 
code for any general problem. The user has to add the extra constraint 


tfp+l -tf-T 

and Eq. (81) becomes the dummy equation 


(83) 


H(tf) — 2/p+l, 


(84) 


where u p+ \ is the associated extra dummy unknown. Whatever the code 
decides the optimal value of H should be at the final time from the other 
equations, i/ p+ i will be set to that. 

To get an idea of the behavior of the Hamiltonian along the optimal 
trajectory, take the time derivative: 


• dH . dH ■ dH . dH 

s7 I+ dI A + 'Sr“ + aT 


(85) 
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For the optimal trajectory, the first two terms cancel, and the third is 
zero, reducing (85) to 

* = £ mH> W 

By definition, if the system is autonomous, H t = 0, which implies that H = 0, 
and thus 

H(t) = H(t,\ (87) 

though that’s not an enforced constraint. 

If H(tf) were a known quantity, this could be used as an independent 
check on the error in an obtained solution. Thus we would like to make 
H(tf) equal to a known constant and make H(t) equal to H(tf) for a general 
problem formulation. 

One solution to both problems is to make time a state in the problem, 

i.e., 


x n+ i = t (88) 

which changes the cost function to: 

J = 4>[x(t f )] + f 1 L(x,u) dt , (89) 

J 0 

where x here includes both the old states plus time, with constraints 

Xi = fi(x,u) i € l...n (90) 

i n +i = 1 (91) 

'Fjar^o), £(*/)] = 0 i 6 1 ...p (92) 

tfp+i = x n+ i(t 0 ) = 0 (93) 

Now if tf is a fixed constant r, we add: 

tfp+2 = *n+i(t/) - t = 0 (94) 

In any case, 'I' is not dependent on tf explicitly, so we always enforce: 

H(t f ) = 0. (95) 


Similarly, since explicit dependence on time has been removed from the prob- 
lem, 

^ = 0, (96) 
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even for a non-autonomous problem, so H(t ) = H{tf ) = 0 over the whole 
trajectory. 

The new state equation for the time state is enforced with a new costate. 
This costate is not as meaningless as one might assume at first glace. To find 
an interpretation for it, we first define a psuedo-Hamiltonian, H. 

H = L(x, u) + J2 A u ). (97) 

1 

Since along the optimal trajectory the states, costates, and control will 
be the same in either formulation and the new state simply substitutes for 
the running time, H(t) is the same as the Hamiltonian before the time state 
was added. The time derivative of the new Hamiltonian is then 

H = fi + A n+1 =0 (98) 

and the enforced boundary condition is: 

H(t f ) = H(t f ) + A n+1 (f/) = 0 (99) 

These two equations imply that 

A n+ 1 = -h (100) 


and 

X n+1 (tf) = - H(tf ) (101) 

which indicates that the extra costate obtained by adding a time state is in 
fact the old Hamiltonian. This relationship boils down to something very 
simple if the problem is autonomous. 

From the necessary conditions for optimality, the costate equation and 
boundary conditions for the new costate are 

A„+i(*o) = v p + 1 
A„+i(f/) = ^p+2 */ ^ ed 

= 0 tf free 

For an autonomous problem, 


dH 

@%n + 1 


= 0 . 


( 102 ) 
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This implies both that 

A n +i = 0 — > A n+1 (t) = A n+ i(t/) (103) 

and that A„+i(t/) will not appear in the other costate equations or the opti- 
mality equation, H u = 0 (and of course not the system dynamics.) Thus the 
only place A n+ i(t/) enters the problem is by balancing H(tf) in Eq. (101). 
Comparing this to Eqs. (84) and (82) it is clear that for autonomous prob- 
lems, free- or fixed-time, that the additional costate associated with the time 
state simply acts as a dummy Lagrange multiplier. So in adding the time 
state, effectively the Hamiltonian absorbs the right hand side of its terminal 
boundary condition so that the new Hamiltonian can always be zero. This is 
essentially what happens in non-autonomous problems except that it is not 
a constant being absorbed. 
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Figure 1. Dimension of Jacobian for Bryson and Ho problem. Sect 24 
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Figure 2 Sparsity of Jacobian for Bryson and Ho problem. Sect 24 
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2-norm of relative error in states for Bryson and Ho problem. Sect 2.4 
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Figure 4. 2-norm of relative error in control for Bryson and Ho problem. Sect 2.4 
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"iguie 5. 2-norm of relative error of states for Bryson and Ho problem. Sect 2.4 
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Figure 6. 2-norm of relative error of control for Bryson and Ho problem, Sect. 2.4 
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Figure 8. Error Analysis for Bryson and Ho problem. Sect 2.4, 6 elements 
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Figure 1 1. Error Analysis for Bryson and Ho problem, Pg 109, 7 elements 
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Figure 12. Error Analysis for Bryson and Ho problem Pg. 109, with 16 elements, optimized from 7 
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Figure It Error Analysis for Bryson and Ho problem, Pg 109, with 24 dements, optimized from 17 


